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Abstract 

In this paper we provide a comprehensive performance assessment and comparison of various soft- 
output and hard-output demodulation schemes for muhiple-input muhiple-output (MIMO) bit-interleaved 
coded modulation (BICM). Even though widely used in literature for demodulator comparison, coded 
bit error rate (BER) has the drawback of being strongly dependent on the outer error correcting code. 
This motivates us to propose a code-independent performance measure in terms of system capacity, i.e., 
mutual information of the equivalent modulation channel that comprises modulator, wireless channel, 
and demodulator We present extensive numerical results for ergodic and quasi-static channels as well as 
for the case of imperfect channel state information. These results reveal that the performance ranking of 
MIMO demodulators is rate-dependent. Furthermore, they provide new insights regarding MIMO-BICM 
system design, specifically the joint selection of demodulation scheme, antenna configuration, and symbol 
constellation for a prescribed target rate. 
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I. Introduction 

A. Background 

Bit-interleaved coded modulation (BICM) [2], [3] has been conceived as a pragmatic approach to 
coded modulation. It has received a lot of attention in wireless communications due to its bandwidth 
and power efficiency and its robustness against fading. For single-antenna systems, BICM with Gray 
labeling can approach channel capacity [2], [4]. These advantages have motivated extensions of BICM 
to multiple-input multiple-output (MIMO) systems [5]-[7]. 

In MIMO-BICM systems, the optimum demodulator is the soft-output maximum a posteriori (MAP) 
demodulator, which provides the channel decoder with log-Ukelihood ratios (LLRs) for the code bits. 
Due to its high computational complexity, numerous alternative demodulators have been proposed in the 
literature, the most important of which are briefly describe below. Applying the max-log approximation [7] 
to the MAP demodulator reduces complexity without significant performance loss and leads to a search for 
data vectors minimizing a Euclidean norm. Efficient exact implementations of the max-log MAP detector 
based on sphere decoding have been presented in [8]-[10]. The implementation complexity can be further 
reduced by replacing the Euclidean norm with the norm [11], [12]. However, the complexity of these 
methods still grows exponentially with the number of transmit antennas. An alternative demodulator that 
approximates the max-log LLRs is based on semidefinite relaxation (SDR) and has polynomial worst-case 
complexity [13]. 

Several demodulation schemes use a list of candidate data vectors to dehver approximate LLR values. 
Decreasing the size of the candidate list allows to trade off performance for complexity savings. A popular 
way to generate such lists are tree search techniques, with the list sphere decoder (LSD) [14] being a well- 
known example. Another approach to obtain a candidate list is based on lattice reduction (LR) [15], [16]. 
Variants of lattice-reduction aided list demodulators were proposed in [17]-[19]. Alternatively, candidate 
vectors can be obtained by "flipping" some of the bits in the bit label of the data vector obtained by 
(hard) maximum likelihood (ML) demodulation [20] (variants are obtained by replacing the ML detector 
with other hard-output detectors). 

MIMO demodulators with still smaller complexity consist of a linear equalizer followed by per-layer 
scalar soft demodulators. This approach has been studied using zero-forcing (ZF) [21], [22] and minimum 
mean-square error (MMSE) [23] equalization. The soft interference canceler (SoftIC) proposed in [24] 
iteratively performs per-layer demodulation, preceded by a subtraction of an interference estimate. This 
estimate is generated using soft symbols for the other layers obtained in the previous iteration. 
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Low-complexity alternatives to soft demodulators are given by hard-output MIMO detectors that 
provide tentative decisions for the code bits but no associated reliability information. Among the best- 
known schemes here are ML, ZF, and MMSE demodulation [25] and successive interference cancelation 
(SIC) [26]-[28]. 

B. Contributions 

In the context of MIMO-BICM, the performance of the above MIMO demodulators has usually been 
assessed numerically in terms of coded bit error rate (BER), with a specific choice for the outer channel 
code. However, such numerical BER results can change dramatically with different channel codeil 
thereby rendering a meaningful demodulator comparison difficult. 

In this paper, we advocate an information theoretic approach for assessing the performance of (soft 
and hard) MIMO demodulators in the context of non-iterative (single-shot) BICM receivers (see also 
[1]). Following [5], we propose the mutual information between the code bits and the associated MIMO 
demodulator output as a code-independent performance measure. This quantity can be interpreted as 
system capacity (maximum rate allowing for error-free information recovery) for an equivalent "modu- 
lation" channel that comprises modulator and soft/hard demodulator in addition to the physical channel. 
This approach establishes an systematic framework for the assessment of current and future MIMO 
demodulation schemes. We note that ZF-based and max-log demodulation have been compared in a 
similar spirit in [22]. 

Based on Monte-Carlo simulations of the proposed system capacity, this paper provides extensive 
performance evaluations and comparisons for the above-mentioned MIMO demodulators for various 
system configurations in fast fading and in quasi-static fading. We also investigate the performance 
loss of the various demodulation schemes in non-idealized scenarios where only estimates of the channel 
matrix and the noise variance are available at the receiver. Due to lack of space, only part of our numerical 
results are shown here. Additional results for other antenna configurations, symbol constellations, and 
symbol labelings can be found in a supporting document [29]. 

Our results allow for several interesting conclusions. Most importantly, we found that there is no 
universal performance ranking of MIMO demodulators, i.e., such a ranking depends on the code rate or, 
equivalently, on the signal-to-noise ratio (SNR). As an example, MMSE soft demodulation outperforms 
hard ML demodulation at low rates while at high rates it is the other way around. We also verify this 

'of course, performance also depends on system parameters, like antenna configuration, symbol constellation, labeling, etc. 



Mai-ch 16, 2009 



DRAFT 



4 



SUBMITTED TO IEEE TRANSACTIONS ON SIGNAL PROCESSING, MAR. 2009 



Encoder 



Equivalent "modulation" channel 



iVIap. 



IVIap. 



MIMO 



Channel 



yN . 



E « 

Q. -a 
o o 

-§ ^ 

« <5 



X 



(AiN) 



Decoder 



Fig. 1. Block diagram of a MIMO-BICM system. 



surprising observation in terms of BER simulations using low-density parity-check (LDPC) codes. Finally, 
we use our numerical results to develop practical guidelines for the design of MIMO-BICM systems, 
i.e., which antenna configuration, symbol constellation, and demodulator to prefer in order to achieve a 
certain rate with minimum SNR. 



C. Paper Organization 

The rest of this paper is organized as follows. Section JI] discusses the MIMO-BICM system model 
and Section HID proposes system capacity as performance measure. In Sections HVl and IVl we assess the 
system capacity achievable with the MIMO-BICM demodulators referred to above for the case of fast 
fading. Section |Vl] analyzes the impact of imperfect channel state information (CSI) on demodulator 
performance, and Section I VII I investigates the rate-versus-outage tradeoff of selected demodulators in 
quasi-static environments. In Section IVIIII we summarize key observations and infer practical system 
design guidelines. Finally, conclusions are provided in Section |lXl 



II. System Model 

A. MIMO-BICM Transmission Model 

A block diagram of our MIMO-BICM model is shown in Fig. [T] A sequence of information bits h[q] is 
encoded using an error-correcting code and is then passed through a bitwise interleaver 11. The interleaved 
code bits are demultiplexed into Mj antenna streams ("layers"). In each layer k = 1, . . . ,Mt, groups 
of Q code bits c[!^[n], i = 1,. . . ,Q, (n denotes symbol time) are mapped via a one-to-one function 
/i(-) to (complex) data symbols Xk[n] from a symbol alphabet A of size |^| = 2'^. Specifically, Xf^[n\ = 
fj-{c^j!^^ [n], . . . , cj^^ [n]), where |c[.^^ [n], . . . , cf^^ [n]} = fi'^{xj^[n\) is referred to as the bit label of a;fc[n]. 
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The transmit vector is given bj^ x[n] = (xi[n] . . .XMiln])^ G A^^'^ and satisfies the power constraint 
E{ 1 1 X [n] I p } = £'s . It carries i?o = QMj interleaved code bits q [n] , 1 = 1, . . . , Rq, with cj^^ [n] = CkQ+i [n] . 
We will for simplicity write x[n] = /.f(ci[?i], . . . ,ciig[n]) and c[n] = {ci[n] . . .CRo[n])'^ = fi~^{x[n]) as 
shorthand for the mapping x[n] = (/_f(c[^^ [n], . . . ,c['^''[?i]) . . . /x(cj^2,[?^], . . . ,cjy^J[n]))^ and its inverse. 

Assuming flat fading, the receive vector y[n] = {yi[n\ . . . yMj^[n]y (M^ denotes the number of receive 
antennas) is given by 

y[n] = H[n]x[n] + v[n], n = l,...,N, (1) 

where H[n] is the Mr x Mj channel matrix, and v[?i] = (iii . . . )^ is a noise vector with 

independent identically distributed (i.i.d.) circularly symmetric complex Gaussian elements with zero 
mean and variance a^. In most of what follows, we will omit the time index n for convenience. 

At the receiver, the optimum demodulator uses the received vector y and the channel matrix H to 
calculate LLRs A; for all code bits q, I = I, . . . , Rq, carried by x. In practice, the use of suboptimal 
demodulators or of a channel estimate H will result in approximate LLRs A^. The LLRs are passed 
through the deinterleaver and then on to the channel decoder that delivers the detected bits b[q\. 



B. Optimum Soft MAP Demodulation 

Assuming i.i.d. uniform code bits (as guaranteed, e.g., by an ideal interleaver), the optimum soft MAP 
demodulator calculates the exact LLR for q based on (y, H) according to [7] 

exp I - ^2 

. A , p(Q = l|y,H) xeA'/ ^ " 

Here, p(q|y, H) is the probability mass function (pmf) of the code bits conditioned on y and H, and 
Xi denote the complementary sets of transmit vectors for which c; = 1 and ci = 0, respectively (note that 
j^Mt ^ x^yjxf). Unfortunately, computation of © has complexity 0{\A\'^'^) = 0{2^°), i.e., exponential 
in the number of transmit antennas. For this reason, several suboptimal demodulators have been proposed 
which promise near-optimal performance while requiring a lower computational complexity. The aim of 
this work is to provide a fair performance comparison of these demodulators. 



^The superscripts ^ and ^ denote transposition and Hermitian transposition, respectively. Furthermore, A^'''- = .4 x . . . x ^ is 
the Afx-fold Cartesian product of A, E{-} denotes expectation, and || • || is the P (Euclidean) norm. 
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III. System Capacity 

In order for the information rates discussed below to have interpretations as ergodic capacities, we 
consider a fast fading scenario where the channel H[n] is a stationary, finite-memory process. We recall 
that the ergodic capacity with Gaussian inputs is given by 

Cg = Eh { logs det (l + HH«) } (3) 

(here, I denotes the identity matrix). The non-ergodic regime (slow fading) is discussed in Section IIII-DI 



A. Capacity of MIMO Coded Modulation 

In a coded modulation (CM) system with equally likely transmit vectors x G A^^^ and no CSI at the 
transmitter, the average mutual information in bits per channel use (bpcu) is given by (cf. [5]) 



CcM = /(x;y|H)=iio -Ex,y,H< 



f E /(y|x',H 

/(y|x,H 



log2 TTH — ^\ ) ■ (4) 



This expression involves the conditional probability density function (pdf) (cf. ©) 

1 / l|y-Hx|| 



/(y|x,H) = 7 — ^TTirexp 



(7ra2)MK - - V <yl 

In the following, we will refer to Cqm as CM capacity [2] (sometimes, Cqm is alternatively termed 
"constellation-constrained capacity"). It is seen from (01) that Ccm < ^o with the "raw" bit rate Rq = 
QMj; in fact, the last term in ^ may be interpreted as a penalty term resulting from the noise and 
MIMO interference. 

Using the fact that the mapping between the symbol vector x and the associated bit label {ci , . . . , c/j^ } 
is one-to-one and applying the chain rule for mutual information [30] to (01) leads to 

Ro 

Ccm = ■ ■ • ,CRo;y|H) = ^/(q;y|ci, . . . ,q_i,H). (5) 

1=1 

The single-antenna equivalent of ([5]) served as a motivation for multilevel coding and multistage decoding, 
which can indeed achieve CM capacity [4]. 



DRAFT 



March 16, 2009 



p. FERTL et at: PERFORMANCE ASSESSMENT OF MIMO-BICM DEMODULATORS BASED ON SYSTEM CAPACITY 



7 



B. Capacity of MIMO-BICM 

Using the assumption of i.i.d. uniform code bits, the maximum rate achievable with BICM can be 
shown to be given by (cf. [5]) 

Cbicm = J^/(Q;y|H) (6) 

1=1 



where 6g {0, 1} with equal probability. Since conditioning cannot reduce mutual information, a compar- 
ison of ^ and ^ reveals that 



The gap Ccm — Cbicm increases with \A\ and Mj and depends strongly on the symbol labeling [7]. 
For single-antenna BICM systems with Gray labeling, this gap has been shown to be negligible [2], [4]; 
however, for MIMO it can be considerably larger (see Section ITVl ). 

It can be shown that A; in Q is a sufficient statistic [30] for q conditioned on y and Therefore, 
Q can be rewritten as 



Hence, Cbicm can be interpreted as the capacity of an equivalent channel with input ci and output A^. 
This channel is characterized by the conditional pdf /(A^jq), which usually is hard to obtain analytically, 
however. 

C. System Capacity and Demodulator Performance 

Motivated by the interpretation of Cbicm as the system capacity of BICM using the optimum MAP 
demodulator, we propose to measure the performance of sub-optimal MIMO-BICM demodulators via the 
system capacity of the associated equivalent "modulation" channel with discrete input ci and continuous 
output A/ (cf. Fig. [T]). This channel is described by the conditional pdf /(A^jq). Its system capacity is 

^Note, however, that {Ai, . . . , Afl^} is not a sufficient statistic for {ci, . . . 




Cbicm < Ccm- 




(7) 



1=1 
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defined as the mutual information between ci and A/, which can be shown to equal 

C^^/(q;AO (8a) 
1=1 

where f{Ai) = ^ X]b=o /(^zIq = We emphasize that the system capacity C provides a performance 
measure for MIMO soft demodulators that is independent of the outer channel code. In fact, it has an 
intuitive operational interpretation as the highest rate achievable (in the sense of asymptotically vanishing 
error probability) in a BICM system (cf. [2, Section III.A]) using the specific demodulator which produces 
A^. Since A/ is derived from y and H, the data processing inequality [30] implies that C < Cbicm with 
equality if A/ is a one-to-one function of A/. The performance of a soft demodulator can thus be measured 
in terms of the gap Cbicm — C. Of course, the information theoretic performance measure in ^ does 
not take into account complexity issues and it has to be expected that a reduction of the gap Cbicm — C 
in general can only be achieved at the expense of increasing computational complexity. 

We caution the reader that the rates in ^ and dSjl are sums of mutual informations for the individual 
code bits ci,...,cjig carried by one symbol vector. Indeed, the pdfs /(A^jq) and /(A/|c/) in general 
depend on the code bit position I, even though for certain systems (e.g. 4-QAM modulation) the code bit 
protection and LLR statistics are independent of the bit position / for reasons of symmetry. Achieving 
^ and dS) thus requires channel encoders and decoders that take the bit position into account. When 
the channel code fails to use this information, the rate loss is small provided that the mapping protects 
different code bits ci roughly equally against noise and interference. 

D. Non-ergodic channels 

In the case of quasi-static or slow fading [31], the channel H is random but constant over time, i.e., 
each codeword can extend over only one channel realization. In this regime, an ergodic system capacity 
of the equivalent modulation channel is no longer meaningful [31], [32]. Instead we consider the outage 
probability 

Po.t{R)=HRn<R}, (9) 



where i?H is a random variable defined as 



Rq 

1=1 
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Here, /H(Q;Ai) denotes conditional mutual information, which is evaluated with /(A/|q,H) in place 
of /(A/|q) (cf. ([8]l). Note that the ergodic system capacity C in ([8]l equals C = Eh{^h}- The outage 
probability Poui{R) can be interpreted as the smallest probability of error achievable at rate R [32]. A 
closely related concept is given by the e-capacity of the equivalent modulation channel, defined as 

C, = sup {R I P{Rh <R} <e}. (10) 

The e-capacity may be interpreted as the maximum rate for which a probability of error less than e can 
be achieved. Rates smaller than are referred to as e-achievable rates [32]. If Pout(^) is a continuous 
and increasing function of R (which is usually the case in practice), it holds that Pout(C'e) = e. 

IV. Baseline MIMO-BICM Demodulators 

In this section, we first review max-log and hard ML demodulation as well as linear MIMO demod- 
ulators and then we provide results illustrating their performance in terms of system capacities. These 
demodulators serve as baseline systems for later demodulator performance comparisons in Section |Vl 
We note that max-log and hard ML MIMO demodulators have the highest complexity among all soft 
and hard demodulation schemes, respectively, whereas linear MIMO demodulators are computationally 
the most efficient ones. 



A; = ^ 



mill ||y — Hx|p — min ||y — Hx|p 



(11) 



A. Max-Log and Hard ML Demodulator 

Applying the max-log approximation to Q simplifies the LLR computation to a minimum distance 
problem, resulting in the approximate LLRs [7] 

1 

xeA-f'"" " xeA-/ 

Eq. (ITTT ) facilitates (hardware) implementation since it avoids the logarithm and exponential functions in 
Q. However, computation of A/ in (fTTI ) still requires two searches over sets of size jyl[^^Y2 = 2^"~^. 
Efficient sphere decoder implementations of (fTTI) are presented in [8], [10]. 
Hard vector ML demodulation can be performed by solving 

Xml = arg min||y - Hx|p. (12) 

The corresponding detected code bits ci are then obtained via the one-to-one mapping between code 
bits and symbol vectors, i.e., c = (ci...cr,J^ = /^"^(xml)- It can be shown that the code bits q 
obtained by the hard ML detector correspond to the sign of the corresponding max-log LLRs in (fTTb . 
i.e., q = u(A;) where u(-) denotes the unit step function. When it comes to evaluating the system capacity 
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with hard-output demodulators, the only difference to soft-output demodulation is the discrete nature of 
the outputs ci of the equivalent "modulation" channel, which here becomes a binary symmetric channel 
(BSC). Consequently, the integral over A; in (|8bl ) is replaced with a summation over q G {0, 1}. 

B. Linear Demodulators 

In the following, A^*^ is the LLR corresponding to c^*^ (the ith bit in the bit label of the kth symbol 
Xk). Soft demodulators with extremely low complexity can be obtained by using a linear (ZF or MMSE) 
equalizer followed by per-layer max-log LLR calculation according to 



I i2 -I - i2 

mm\xk — x\ — mm\xk — x\ 



1,...,Q, k = l,...,Mj. (13) 



Here, A^cA denotes the set of (scalar) symbols whose bit label at position i equals b, x^ is an estimate 
of the symbol in layer k provided by the equalizer, and af, is an equalizer-specific weight (see below). We 
emphasize that calculating LLRs separately for each layer results in a significant complexity reduction. 
In fact, calculating the symbol estimates Xk using a ZF or MMSE equalizer requires 0(Af|) operations; 
furthermore, the complexity of evaluating ( fT3l ) for all code bits scales as 0{Mj2'^), i.e., linearly in the 
number of transmit antennas. 

Equalization-based hard bit decisions can be obtained by quantization of the equalizer output x^ 
with respect to A (denoted by Q{-)), followed by the demapping, i.e., (c^.^-* . . . c~^^Y — A*~^(S(^A:))- 
Again, the detected code bits correspond to the sign of the LLRs, i.e., = u(A^*^). 

1) ZF-based Demodulator [21], [22]: Here, the first stage consists of ZF equalization, i.e.. 



xzF = (HHH)-iH«y = x + v, (14) 

where the post-equalization noise vector v has correlation matrix 

Rv = E{vv"} = (H"H)-^ (15) 

Subsequently, approximate bit LLRs are obtained according to $T3\) with symbol estimate! = (xzf)^ 
and weight factor = (Rv)k,k- 

''By (x)fc and (X)fc.; we respectively denote the fcth element of the vector x and the element in row k and column / of the 
matrix X. 
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2) MMSE-based Demodulator [23]: Here, the first stage is an MMSE equalizer that can be written 
as (cf. (O and CS])) 

xmmse = WizF , with W = (I + Rv)^^ (16) 



Approximate LLRs are then calculated according to (113]) with 

(xmMSe)^ , 2 1 ~ 

where = (W)^ fc. We emphasize that throughout this paper the results for soft (and hard) MMSE 
demodulation are based on the unbiased MMSE equalizer output IZSMM^Elii jn spite of similar complexity, 
MMSE-based demodulation outperforms ZF-based demodulation substantially [23]. 



C. Capacity Results 

We next compare the performance of the above baseline demodulators (i.e., max-log, hard ML, MMSE 
and ZF) in terms of their associated maximum achievable rate C (ergodic system capacity) in ([8]l. In 
addition, the CM capacity Cqm in ©, the MIMO-BICM capacity Cbicm in ®, and the Gaussian input 
channel capacity Cg in ^ are shown as benchmarks. Throughout the paper, all capacity results have 
been obtained for spatially i.i.d. Rayleigh fading, with all fading coefficients normalized to unit variance. 

The pdfs required for evaluating (l8bl ) are generally hard to obtain in closed form. Thus, we measured 
these pdfs using Monte-Carlo simulations and then evaluated all integrals numerically. Based on the 
results in [33], we numerically optimized the binning (used to measure the pdfs) in order to reduce the 
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bias and variance of the mutual information estimates. The capacity results (obtained with 10^ fading 
realizations) are shown in Fig. |2]in bits per channel use versus SNR p = Es/al- In the following, in some 
of the plots we show insets that provide zooms of the capacity curves around a target rate of i?o/2bpcu. 

Fig. Ilta) pertains to the case of Mj x A/r = 4 x 4 MIMO with Gray-labeled 4-QAM (here, = 8). 
At a target rate of 4 bpcu, the SNR required for CM and Gaussian capacity is virtually the same, whereas 
that for BICM is larger by about 1.3 dB. The SNR penalty of using max-log demodulation instead of soft 
MAP is about 0.3 dB. Furthermore, hard ML demodulation requires 2.1 dB higher SNR to achieve this 
rate than max-log demodulation; for soft and hard MMSE demodulation the SNR gaps to max-log are 
1.2 dB and 4.1 dB, respectively, while for soft and hard ZF demodulation they respectively equal 5.1 dB 
and 8.2 dB. An interesting observation in this scenario is the fact that at low rates, soft and hard MMSE 
demodulation essentially coincides with max-log and hard ML demodulation, respectively, whereas at high 
rates MMSE demodulation approaches ZF performance. Moreover, soft MMSE demodulation outperforms 
hard ML demodulation at low rates whereas at high rates it is the other way around. Surprisingly, at very 
low rates soft MMSE even performs slightly better than max-log demodulation. A similar cross-over of 
the capacity curves occurs with hard MMSE and soft ZF demodulation. These observations reveal the 
somewhat unexpected fact that the demodulator performance ranking is not universal but depends on the 
target rate (equivalently, the target SNR), even if the number of antennas, the symbol constellation, and 
the labeling are fixed. Similar observations apply to 16-QAM instead of 4-QAM and to set-partitioning 
labeling instead of Gray labeling (see [29]). Apart from a general shift of all curves to higher SNRs, 
the larger constellation and/or the different labehng strategy causes an increase of the gap between 
CM capacity and BICM capacity. When decreasing the antenna configuration to a 2 x 2 system, we 
observed that soft ZF outperforms hard ML demodulation for low-to-medium rates, e.g., by about 1.7 dB 
at 4 bpcu with 16-QAM [29]. In this rate regime, even hard MMSE performs slightly better than hard 
ML demodulation. 

The situation changes for the case of a 2 x 4 MIMO system with Gray-labeled 16-QAM (again 
i?o = 8), shown in Fig. |2lb). The increased SNR gap between CM and BICM capacity implied by 
the larger constellation is compensated by having more receive than transmit antennas (this agrees with 
observations in [7]). In addition, the performance differences between the individual demodulators are 
significantly reduced, revealing an essential distinction being between soft and hard demodulators. Having 
A/r > Afx helps the linear demodulators approach their non-linear counterparts even at larger rates, i.e., 
soft ZF/MMSE perform close to max-log and hard ZF/MMSE perform close to hard ML, with an SNR 
gap of about 2.3 dB between hard and soft demodulators. Note that in this scenario soft MMSE and soft 
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ZF both outperform hard ML demodulation at all rates. 
D. BER Performance 

Even though we advocate a demodulator comparison in terms of system capacity, the cross-over of some 
of the capacity curves prompts a verification in terms of the BER of soft and hard MMSE demodulation 
as well as max-log and hard ML demodulation. We consider a 4 x 4 MIMO-BICM system with Gray- 
labeled 4-QAM in conjunction with regular LDPC code jf| [34] of block length 64000. For the case of soft 
demodulation, the LDPC codes were designed for an additive white Gaussian noise channel whereas for 
the case of hard demodulators the design was for a BSC. At the receiver, message-passing LDPC decoding 
[34] was performed. In the case of hard demodulation, the message-passing decoder was provided with 
the LLRs 

Az = (2Q-l)logi^, (17) 
Po 

where po = P{ci 7^ c;} is the cross-over probability of the equivalent BSC, which was determined via 
Monte-Carlo simulations. Using ([TT] ) is critical in order to provide the LDPC decoder with appropriate 
reliability information. 

The BERs obtained are shown in Fig. |3la) for a code rate of 1/4 (corresponding to 2 bpcu) and Fig. Hb) 
for a code rates of 3/4 (6 bpcu). We also indicate by vertical lines the respective capacity limits, i.e., the 
minimum SNR required for the target rate according to Fig. Ha). It is seen that all our LDPC code designs 
are less than 1 dB away from the capacity limits. For both rates and at all SNRs, max-log demodulation 
performs best and hard MMSE demodulation performs worst. No such universal statements can be made 
for hard ML and soft MMSE demodulation. Specifically, at rate 1 /4 soft MMSE outperforms hard ML 
detector by 2.3 dB and is only 0.3 dB away from max-log (cf. Fig. Oa)); however, at rate 3/4 soft MMSE 
performs 2dB poorer than hard ML and 3.6 dB poorer than max-log (see Fig. [S^b)). These results confirm 
the capacity -based observation that there is no universal (i.e., rate- and SNR-independent) demodulator 
performance ranking. We note that the block error rate simulations in [35] allow for similar conclusions, 
even though not explicitly mentioned in that paper. 

V. Other Demodulators 

In the following, we study the system capacity of several other MIMO-BICM demodulators that differ 
in their underlying principle and their computational complexity. Unless stated otherwise, capacity results 

'The LDPC code design was performed using the web-tool at |http : / / Ithcwww . epf 1 ■ ch/ re search/ ldpcopt| 
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Fig. 3. BER vs. SNR for a 4x4 MIMO system with Gray-labeled 4-QAM and LDPC codes with (a) rate 1/4 and (b) rate 
3/4. 



in this section pertain to a 4 x 4 MIMO system with 4-QAM using Gray labeling {Rq = 8). We note that 
the results for asymmetric 2x4 MIMO systems with 16-QAM (shown in [29] but not here) essentially 
confirm the general distinction between hard and soft demodulators observed in connection with Fig. Ob). 



A. List-based Demodulators 

In order to save computational complexity, (fTTI ) can be approximated by decreasing the size of the 
search set A^^'^. Usually, this is achieved by generating a (non-empty) candidate list C C A^^'^ and 
restricting the search in ([TT]) to this list, i.e., 

1 " 



A/ 



a?, 



min lly — Hxlp — min ||y — HxlP 



(18) 



The computational complexity of the metric evaluations and minima searches in (1181 ) scales as 
0{M'yMs.\C\). Thus, the list size \C\ allows to trade off performance for complexity savings. A larger list 
size generally incurs higher complexity but yields more accurate approximations of the max-log LLRs. 
For fixed list size, the performance can further strongly depend on which symbol vectors are actually 
included in C In the following, we consider two types of list-based demodulation where the candidate 
list is obtained using sphere decoding and bit flipping, respectively. 

1) List Sphere Decoder (LSD): The LSD proposed in [14] uses a simple modification of the hard- 
decision sphere decoder [36] to generate the candidate list C such that it contains the \C\ symbol vectors 
X with the smallest ML metric ||y — Hx|p (thus, by definition C contains the hard ML solution xml in 
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([T2l)). If the /th bit in the labels of all :x. £ C equals 1, the set C n Xj^ is empty and ([TSl l cannot be 
evaluated. Since in this case there is strong evidence for q = 1 (at least if \C\ is not too small), the 
corresponding LLR A/ is set to a prescribed positive value A » 0. Analogously, A/ = —A in case CnA^^^ 
equals the empty set. 

While the LSD may offer significant complexity savings compared to max-log demodulation, statements 
about its computational complexity are difficult and depend strongly on the actual implementation of the 
sphere decoder as well as the choice of the list size. We note that the case \C\ = 2^° = \A^^^\ implies 
£ = A^^^; thus, Cr\Xf = such that ^ equals the max-log demodulator in ([TTI ). The other extreme 
is a list size of one, i.e., C = {xml} (cf. (fT2l)). in which case either Cr\Xi or Cr\Xl is empty (depending 
on the bit label of xml); here. A; = (2c/ — 1)A where c = (ci . . . cr,,)^ = /^"^(xml) and thus the LSD 
output is equivalent to hard ML demodulation (except for the choice of A, which is irrelevant, however, 
for capacity). 

Capacity Results. Fig. |4] shows the maximum rates achievable with an LSD for various list sizes. BICM 
and soft MMSE capacity are shown for comparison. Note that with 4-QAM and Mj = 4, \C\ = 256 and 
|£| = 1 correspond to max-log and hard ML demodulation, respectively. It is seen that with increasing 
list size the gap between LSD and max-log decreases rapidly, specifically at high rates. In particular, the 
LSD with list sizes of |>C| >8 is already quite close to max-log performance. However, even with large 
list sizes LSD is outperformed by soft MMSE demodulation at sufficiently low rates. Specifically, below 
3.6 dB, 1.5 dB, and — 0.2dB the system capacity of soft MMSE demodulation is higher than that of LSD 
with list size 2, 4, and 8, respectively. Similar observations apply to other antenna configurations and 
symbol constellations (see [29]). 

2} Bit Flipping Demodulators: Another way of generating the candidate list C, proposed in [20], is to 
flip some of the bits in the label of the hard ML symbol vector estimate xml in (IT2l ). More generally, the 
ML solution xml can be replaced by a symbol vector x G A^~^ obtained with an arbitrary hard-output 
demodulator (e.g., hard ZF and MMSE demodulation). Let c = /i~^(x) denote the bit label of x. The 
candidate list then consists of all symbol vectors whose bit label has Hamming distance at most D < Rq 
from c, i.e., £ = {x: dn{fi~^ {-k) , c) < D}. Here, dnlci, C2) denotes the Hamming distance between two 
bit labels ci and C2. This list can be generated by systematically flipping up to D bits in c and mapping 
the results to symbol vectors. The resulting list size is given by \C\ = (^"). Here, the structure 

of the list generated with bit flipping allows to reduce the complexity per candidate to 0{M^) giving 
an overall complexity of (!)(Mr|£|) (plus the operations required for the initial estimate). For D = Rq, 
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SNR [dB] 

Fig. 4. System capacity of LSD with list size \£.\ e {1, 2, 4, 8, 256} (4x4 MIMO, 4-QAM, Gray labeling). 

£ = A^^"^ and ([TSl l reduces to max-log demodulation; furthermore, with x = xml and D = there is 
C = {xjvil} and ( fTSl ) becomes equivalent to hard ML demodulation. Note that in contrast to the LSD, 
bit flipping with D > ensures that C n Xj^ and £ n A!^ are nonempty such that ([T8] ) can always be 
evaluated. 

Capacity Results. Fig. |5] shows the maximum rates achievable with bit flipping demodulation where 
the initial symbol vector estimate is chosen either as the hard ML solution xml in (O or the hard 
MMSE estimate Q(xmmse) (cf. (O). For D = 1 (j£| = 9), Fig. Ha) reveals that flipping 1 bit (labeled 
'flip-l') allows for significant performance improvements over the respective initial hard demodulator 
(about 2.5 dB at 2bpcu). For rates below 4bpcu, hard ML and hard MMSE initialization yield effectively 
identical results, close to soft MMSE demodulation. At higher rates MMSE-based bit flipping even 
outperforms soft MMSE demodulation slightly. For D = 2 {\C\ = 37), it can be seen from Fig. |5lb) 
that bit flipping demodulation performs close to max-log at low rates and that hard ML and hard MMSE 
initialization are very close to each other for rates up to 6bpcu (SNR up to 7dB); in fact, below 5bpcu 
hard MMSE initialization performs slightly better than hard ML initialization while at higher rates ML 
initialization gives significant gains. To maintain this behavior for larger constellations and more antennas, 
the maximum Hamming distance D has to increase with increasing Rq (see [29]). 

B. Lattice-Reduction-Aided Demodulation 

Lattice reduction (LR) is an important technique for improving the performance or complexity of 
MIMO demodulators [15], [16] for the case of QAM symbol constellations. The basic underlying idea 
is to view the columns of the channel matrix H as basis vectors of a point lattice. LR then yields 
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an alternative basis which amounts to a transformation of the system model ([T|) prior to demodulation; 
the advantage of such an approach is that the transformed channel matrix (i.e., the reduced basis) has 
improved properties (e.g., smaller condition number). An efficient algorithm to obtain a reduced basis was 
proposed by Lenstra, Lenstra, and Lovasz (LLL) [37]. The overall computational complexity of LR-aided 
demodulation depends strongly on the complexity of the LLL algorithm which has been assessed in [38]. 
A comparison of different LR methods in the context of MIMO hard demodulation was provided in [39]. 

LR is usually formulated for the equivalent real-valued model; hence, for now we assume all 
quantities to be real-valued. Any lattice basis transformation can be formulated in terms of a unimodular 
transformation matrix T, i.e., a matrix with integer entries and det(T) = ±1. Denoting the "reduced 
channel" by H = HT and defining z = T~^x, the system model (dJ can be rewritten as 

y = Hx + v = Hz + v. (19) 

Under the assumption x € Z^^^ (which for QAM can be ensured by an appropriate offset and scaling), the 
unimodularity of T guarantees z G Z^^^ and hence any demodulator can be applied to the better-behaved 
transformed system model on the right-hand side of (|T9l ). Here, we adopt the LR-aided hard-output 
MMSE demodulator developed in [16]. LR-aided soft demodulators (cf. [17]) are essentially list-based 
[18], [19]. In fact, many of these methods apply bit flipping (cf. Section rV-A2l ) to an LR-aided hard 
demodulator output. 

Capacity Results. Fig. |6] shows the capacity results for hard and soft LR-aided MMSE demodulation. 
Soft outputs are obtained by applying bit flipping with D = 1 and D = 2 to the LR-aided hard MMSE 
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Fig. 6. System capacity of LR-aided hard and soft MMSE demodulation (4x4 MIMO, 4-QAM, Gray labeling). 



demodulator output (cf. Section IV-A2I ). It is seen that with hard MMSE demodulation, LR is beneficial 
only at SNRs above 8dB (rates higher than 4.4bpcu). At rates higher than about 6.8bpcu, LR-aided 
hard demodulation even outperforms soft MMSE demodulation. Bit flipping is helpful particularly at 
low rates. Thus, at low rates LR-aided soft MMSE demodulation approaches max-log performance and 
outperforms hard ML. When flipping up to Z? = 2 bits, LR-aided soft demodulation reveals a significant 
performance advantage over soft MMSE demodulation without LR which is especially pronounced in 
the high-rate regime. 

C. Semidefinite Relaxation (SDR) Demodulation 

Semidefinite relaxation allows to approximately solve the hard ML demodulation problem in (fT2l ) [40], 
[41] with polynomial worst-case complexity based on convex optimization techniques. In this paper, we 
focus on the quasi-ML hard-output demodulator and its soft extension proposed in [13] that approximates 
max-log demodulation (with an overall worst-case complexity of 0{Rq^)). We note that this approach 
applies only to BPSK or 4-QAM alphabets and employs a randomization procedure described in detail 
in [41]. 

Capacity Results. In Fig. |7] we show the system capacity for hard and soft SDR demodulation (as 
described in [13]) using randomization with 25 trials. It can be seen that hard SDR demodulation coincides 
with hard ML at low rates; at high rates it performs worse than hard ML but still better than soft MMSE. 
Soft SDR performs consistently better than hard SDR; while at low rates it coincides with soft MMSE 
demodulation, at rates higher than 6.2bpcu its capacity lies between hard ML and soft MMSE. 
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Fig. 7. System capacity of hard and soft SDR demodulation (4x4 MIMO, 4-QAM, Gray labeling). 



D. i°°-Norm Demodulator 

It was shown in [1 1], [12] that the VLSI implementation complexity of hard ML demodulation (of. (fT2l )) 
can be significantly reduced by replacing the l"^ norm in the sphere decoder with the norm ||a||oo = 
max{Re{ai },..., RejoA/}, Im{ai },..., Im{aj\/}} (here, a is a complex-valued vector of length M). 
Specifically, the use of the £°° norm avoids expensive squaring operations. The hard ^°°-norm demodulator 
delivers the symbol vector estimate 

x^cx, = arg min||Q^y — Rx||oo, (20) 

Here, Q and R are the Mr x Mt unitary and Mj x Mj upper triangular factors in the QR decomposition 
H = QR of the channel matrix. Note that the solution to (|20l ) may not be unique, in which case one 
solution is selected at random. 

We propose to generate soft outputs by using the ^°°-norm sphere decoder to determine 

xl* = arg min|lQ"y - Rx||oo 

for 6 G {0, 1} and then evaluating the approximate LLRs using the l'^ norm: 

1 



A, 



Capacity Results. Fig. [8] shows the system capacity for hard and soft £°^-norm demodulation. For the 
4x4 case with 4-QAM in Fig. Oa), hard and soft ^°°-norm demodulation perform within 1 dB of hard 
ML and max-log, respectively, particularly for high rates. However, at low rates (below 4bpcu) ^°°-norm 
demodulation is even outperformed by its corresponding MMSE counterpart. An interesting observation 
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applies to the 2x4 case with 16-QAM depicted in Fig. [Mb). All soft-output baseline demodulators 
perform almost identical and the same is true for all hard-output baseline demodulators, i.e., there is 
only a distinction between soft and hard demodulation (cf. Fig. El^b)). However, soft and hard ^°°-norm 
demodulation perform significantly worse in this asymmetric setup, specifically at low to medium rates. 
At 2bpcu, soft £°°-norm demodulation requires 1.75 dB higher SNR than max-log and soft MMSE and 
hard £°°-norm demodulation requires 2.6 dB higher SNR than hard ML/MMSE. 

E. Successive and Soft Interference Cancelation 

Successive interference cancelation (SIC) is a hard-output demodulation approach that became popular 
with the V-BLAST {Vertical Bell Labs Layered Space-Time) system [26]. Within one SIC iteration, only 
that layer with the largest post-equalization SNR is detected and its contribution to the receive signal 
is subtracted (canceled). A SIC implementation that replaces the ZF-based algorithm from [26] with 
an MMSE-based scheme and performs efficient layer ordering according to signal-to-interference-plus- 
noise-ratio (SINR) was presented in [27]. Suboptimal but more efficient SIC schemes are discussed in 
[28]. 

In order to mitigate the error propagation inherent to SIC, [24] proposed a parallel soft interference 
cancelation (SoftIC) scheme. SoftIC is an iterative method that altematingly performs (i) parallel MIMO 
interference cancelation based on soft symbols and (ii) computation of improved soft symbols using the 
output of the interference cancelation stage. Each iteration of this scheme has a complexity that scales 
linearly with the number of antennas. Here, we use a modification that builds upon bit-LLRs. Let a[*^ [j\ 
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denote the LLR for the ith bit in layer k obtained in the jth iteration. Symbol probabilities can then be 
obtained as 

''^'"Ml+exp(A«[il)' 
with hi{x) denoting the ith bit in the label of x ^ A, leading to the soft symbol estimate 

Soft interference cancelation for each layer then yields 

yf = y-Yl ^'''^k' =Kxk + Yl hfc' i^k' - x^i^) + V, (21) 

where denotes the kth column of H. Finally, updated LLRs A[*^[j+1] are calculated from (|2TI ) based 
on a Gaussian assumption for the residual interference plus noise (for implementation details we refer to 
[24]). In contrast to [24], we suggest to initialize the scheme with the LLRs obtained by a low-complexity 
soft demodulator, e.g., the soft ZF detector described in Section IIV-BI The complexity per iteration of 
the above SoftIC algorithm is given by 0(2^^ M'y{Q + Mr)) (plus the operations required for the initial 
LLRs). 

Capacity Results. In Fig. |9j we display capacity results for (hard) MMSE-SIC with detection ordering 
as in [28] (there referred to as 'MMSE-BLAST') and for SoftIC demodulation (initialized using soft ZF 
demodulation, also shown for reference). Hard MMSE-SIC demodulation is seen to perform similarly 
to hard ML demodulation at low rates. While at high rates MMSE-SIC shows a noticeable gap to hard 
ML, it can outperform both soft MMSE and SoftIC in this regime. When using 16-QAM instead of 4- 
QAM, at very low rates MMSE-SIC tends to even slightly outperform hard ML demodulation (see [29]). 
SoftIC (with 8 iterations) is superior to MMSE-SIC up to rates of 6.7bpcu. At low rates, SoftIC even 
performs slightly better than max-log demodulation and essentially coincides with BICM capacity. For 
the chosen system parameters, SoftIC furthermore beats soft MMSE over the whole SNR range shown. 
This statement does not hold in general, however. For example, with 16-QAM the SoftIC performance 
drops below that of soft MMSE demodulation at high rates (see [29]). 

VI. Imperfect Channel State Information 

We next investigate the ergodic system capacity dSjl for the case of imperfect channel state information 
(CSI). In particular, we consider training-based estimation of the channel matrix H and the noise variance 
(7^ and assess how the amount of training influences the performance of the various demodulators. 
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Training-based Channel Estimation. In order to estimate the channel, the transmitter sends iVp > Mr 
training vector^ which are arranged into a full rank Mr x A'p training matrix Xp. We assume that the 
transmit power per channel use for training and actual data is the same such that the Frobenius norm 
[42] of Xp equals ||Xp||^ = iVp^;^. 

Assuming that the channel stays constant for the duration of one block (which contains training and 
actual data), the Mr x A'p receive matrix Yp induced by the training is given by 

Yp = HXp + V. (22) 

Here, the A/r x Ap matrix V contains the noise received during the training period. 

Using ((22l) . the least-squares channel estimate (identical to the ML estimate under a Gaussian i.i.d. 
assumption for the noise) is computed as [43] 

H = YpXH(XpXH)-i. (23) 

This estimate is unbiased and its mean square error equals 

E{|1H - ml] = MRa.2 tr{(XpX«)-i} > , 

^vp p 

where the lower bound is attained with orthogonal training sequences, i.e., XpXp = I (we recall 
that here p = E^/al denotes the SNR). 

The estimated channel matrix H is then used to obtain the noise variance estimate 

1 



= 7 rllYn - HXp||2 (24) 

" MR(Afp-MT)" P P'"^ 



*While A'p = A'/t is sufficient to estimate H, extra training is required for estimation of a^. 
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which essentially amounts to measuring the mean power of Yp in the (A'p—Afx) -dimensional orthogonal 
complement of the range space of Xp . The noise variance estimate is also unbiased and has MSE 



E{|<T2-a2|2} 



M^{N^-Mjy 

which should be noted to be independent of the transmit power. 

The estimates H in (l23l ) and ct^ in ((24l) then replace the true channel matrix and noise variance in 
the computation of the demodulator outputs, i.e., we consider mismatched demodulation. One could also 
change the metric in order to take into account the channel imperfections as in [44]; however, this is 
beyond the scope of this paper. 

Capacity Results. The section provides numerical results for the ergodic system capacity of max- 
log, hard ML, and soft MMSE demodulation with perfect and estimated CSI (using orthogonal training 
sequences). Throughout, a 4 x 4 MIMO system with 4-QAM and Gray labeling is considered {Rq = 8). 
Results for other demodulators with imperfect CSI are provided in [29]. 

Fig. [To] shows the maximum achievable rates versus SNR for a fixed training sequence length of 
A'p = 5 (this corresponds to the worst case with minimum amount of training). It is seen that for all 
three detectors imperfect CSI results in a significant performance loss, e.g., at 4bpcu the SNR loss for 
max-log, hard ML, and soft MMSE is 3.9 dB, 3.1 dB, and 4.95 dB, respectively. In fact, in the considered 
worst case imperfect CSI setup the performance advantage of soft MMSE demodulation over hard ML 
demodulation at low rates is much less pronounced; note that the cross-over between hard ML and soft 
MMSE performance at an SNR of about 6.3 dB shifts from 4.9bpcu for perfect CSI to 3bpcu for the 
case of imperfect CSI. The performance losses for all demodulators tend to be smaller at high rates, 
which may be partly attributed to the fact that CSI becomes more accurate with increasing SNR whereas 
this is not the case for the noise variance estimate. In general it can be observed that the performance 
loss of hard ML is the smallest while soft MMSE and max-log performance deteriorates much stronger; 
note that hard ML does not use the noise variance and hence is more robust to estimation errors in a^. 
Here, hard ML comes within less than 1.3 dB of max-log performance. In contrast, soft MMSE uses the 
imperfect channel and the noise variance estimate in the MMSE equalization stage (cf. ( [T6l )) and in the 
LLR calculation and is thus most strongly affected. 

To investigate the impact of the amount of training. Fig. [TTT a) and (b) depict the minimum SNRs 
required by the individual demodulators to achieve target rates of 2 bpcu and 6 bpcu, respectively, versus 
the number of training vectors A'p. It is seen that for all demodulators, the required SNR decreases rapidly 
with increasing amount of training. Yet, even for A'p = 20 there is a significant gap of 1 to 2 dB to perfect 
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Fig. 10. System capacity of baseline demodulators with perfect and imperfect CSI (4x4 MIMO, 4-QAM, Gray labeling, 
Nj, = 5 training vectors). 




CSI performance (indicated by corresponding horizontal, gray lines). Here, soft MMSE outperforms hard 
ML at 2bpcu and gets closer to max-log performance with a larger amount of training. In contrast, at 
6bpcu hard ML performs consistently better than soft MMSE. 

We note that further capacity results for the demodulators discussed in Section |V] are provided in [29]. 
One of the more interesting observations obtained from those results is that for training duration A'p = 5, 
the LSD with list size j£| > 8 consistently outperforms max-log (at least in the MIMO setup considered). 
Although this is not the case for larger training duration, LSD with |£| = 8 performs mostly very close 
to max-log. 
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VII. Quasi-static Fading 

In this section we provide a demodulator performance comparison for quasi-static fading MIMO 
channels based on the outage probability Pout(^) in & and the e-capacity in ( fTOl ). The setup considered 
(4x4 MIMO with Gray-labeled 4-QAM) is the same as before apart from the spatially i.i.d. Rayleigh 
fading channel which now is assumed to be quasi-static. The outage probability -Pout(^) was measured 
using 10^ blocks (affected by independent fading realizations), each consisting of 10^ symbol vectors. 
To keep the presentation concise, we restrict to the baseline demodulators from Section |IVl 

Fig. [Eta) shows the outage probability versus SNR p for target rates of ii = 2 bpcu and = 6 bpcu. 
For i? = 2bpcu, it is seen that the gap between optimum soft MAP demodulation (labeled 'BICM' for 
consistency with previous sections) and max-log demodulation equals about 0.5 dB. In such a low-rate 
regime, soft MMSE performs as well as max-log and about 2.5 dB better than hard ML. While max-log, 
hard ML, and soft MMSE demodulation all achieve full diversity (cf. the slope of the corresponding 
outage probability curves), soft ZF only has diversity order one, resulting in a huge performance loss 
(almost 19 dB at Pout(^) = 10~^). At i? = 6bpcu the situation is quite different: here, max-log coincides 
with soft-MAP and hard-ML looses only 1.4 dB (again, those three demodulators achieve full diversity). 
Surprisingly, soft MMSE breaks down significantly at this rate, loosing all diversity. At Pout(-R) = 10"^, 
the SNR loss of soft MMSE and soft ZF relative to max-log equals about 11 dB and 19 dB, respectively. 

The degradation of soft MMSE with increasing rate is clearly visible in Fig. [Tllb). which shows 
e-capacity versus SNR for a maximum outage probability of e = 10~^. Some of the behavior of the e- 
capacity curves is qualitatively very similar to the results obtained for the ergodic capacity (cf. Fig. Oa)): 
soft MMSE demodulation outperforms hard ML demodulation (by up to 2.2 dB for rates less than 
3.8 bpcu) while at high rates it is the other way round. Furthermore, for low rates soft MMSE demodulation 
essentially coincides with max-log whereas at high rates it approaches soft ZF performance. 

Comparison of Fig. fl^ b) and Fig. Oa) even suggests that there is a connection between the diversity 
of the demodulators in the quasi-static scenario and their SNR gap to optimum demodulation in the 
ergodic scenario. For all rates (SNRs), the max-log and hard ML demodulator both achieve constant 
(full) diversity in the quasi-static regime and maintain a roughly constant gap to soft MAP in the ergodic 
scenario. On the other hand, the diversity of soft MMSE demodulation in the quasi-static case and its 
SNR gap to soft MAP in the ergodic scenario both deteriorate with increasing rate/SNR. Note that also 
here soft ZF demodulation performs worst, maintaining a diversity equal to 1 for all rates. 
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Fig. 12. Demodulator performance in quasi-static fading: (a) outage probability versus SNR for ii = 2bpcu and i? = 6bpcu, 
and (b) e-capacity versus SNR for e=10"^ (4x4 MIMO, 4-QAM, Gray labeling). 



VIII. Key Observations and Design Guidelines 

Based on the results of the previous sections, we summarize some key observations and provide 
practical guidelines for system design. 

Soft MMSE demodulation was seen to approach max-log performance for low rates, both in the ergodic 
and the quasi-static regime and for various system configurations (see also [29]). Moreover, soft MMSE 
demodulation is very attractive since it has the lowest complexity among all the suboptimal demodulators 
discussed in this work. Therefore, soft MMSE demodulation is arguably the demodulator of choice when 
designing MIMO-BICM systems with outer codes of rate less than 1/2. For the case of imperfect CSI, 
however, care should be taken to ensure a sufficient amount of training since otherwise soft MMSE may 
suffer severely from inaccurate estimates of the channel matrix and the noise variance. We note that soft 
ZF demodulation performed consistently poorer than soft MMSE at the same computational cost; thus, 
there appear to be no reasons to prefer soft ZF in practical implementations. 

Soft MMSE has an even stronger case against its competitors for asymmetric MIMO systems (i.e., 
Mr < Mr), where it performs close to BICM capacity for all rates. Fig. [13] compares a 4 x 4 MIMO 
system using 4-QAM (system I) and a 2 x 4 system using 16-QAM (system II), both using Gray labeling 
and achieving Rq = 8. Whereas at low rates soft MMSE demodulation performs better with system I 
than with system II, it is the other way around for high rates. For example, at 6 bpcu system II requires 
1.25 dB less SNR than system I in spite of using fewer active transmit antennas. This observation is of 
interest when designing MIMO-BICM systems with adaptive modulation and coding. Specifically, with 
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Fig. 13. System capacity of baseline demodulators for a 4x4 MIMO system using 4-QAM and a 2x4 MIMO system using 
16-QAM (both with Gray labeling). 



soft MMSE demodulation below 4bpcu system I is preferable, whereas above 4bpcu it is advantageous 
to deactivate two transmit antennas and switch to the 16-QAM constellation (system II). We note that 
with max-log and hard ML demodulation, system II performs consistently worse than system I. 

The only regime where soft MMSE suffers from a noticeable performance loss is symmetric systems 
at high rates, where no low-complexity demodulation scheme is able to achieve max-log performance. 
These observations apply to the ergodic and quasi-static scenario. However, with perfect CSI the LSD and 
soft £°°-norm demodulation come reasonably close to max-log. The LSD has the additional advantages 
of being able to trade off performance for complexity reduction. Furthermore, note that for system I hard 
ML (i.e., the LSD with list size 1) outperforms most suboptimal soft demodulators for rates larger than 
6 bpcu. 

As a general rule of thumb, we conclude that at low rates linear soft demodulation is sufficient 
and generally preferable due to its low computational complexity. At high rates non-linear demodulator 
structures provide better performance, even when they deliver hard rather than soft outputs. If complexity 
is no issue, max-log demodulation is mostly superior to all other demodulators since it yields the highest 
rates over a wide range of system parameters and SNRs. Notable exceptions are SoftIC (which is a low 
complexity demodulator) and LSD which have the potential to outperform max-log at low rates and for 
imperfect CSI, respectively. 
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IX. Conclusion 

We provided a comprehensive performance assessment and comparison of soft and hard demodulators 
for non-iterative MIMO-BICM systems. Our comparison is based on the information-theoretic notion of 
system capacity, which can be interpreted as the maximum achievable rate of the equivalent "modulation" 
channel that comprises modulator, physical channel, and demodulator. As a performance measure, 
system capacity has the main advantage of being independent of any outer code. Extensive simulation 
results show that a universal demodulator performance ranking is not possible and that the demodulator 
performance can depend strongly on the rate (or equivalently the SNR) at which a system operates. In 
addition to ergodic capacity results, we investigated the non-ergodic fading scenario in terms of outage 
probability and e-capacity and analyzed the robustness of certain demodulators under imperfect channel 
state information. Our observations provide new insights into the design of MIMO-BICM systems (i.e., 
choice of demodulator, number of antennas, and symbol constellation). Moreover, our approach sheds 
light on issues that have not been apparent in the previously prevailing BER performance comparisons 
for specific outer codes. For example, an important key observation is that with low-rate outer codes soft 
MMSE demodulation is preferable over other demodulators since it has low complexity but nonetheless 
comes very close to max-log performance. 
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